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(54) Analysis of digital signals 

(57) A method of processing a data stream in which 
captions are transmitted, for example a television signal. 
The method comprises determining at least one quan- 
titative characteristic of variation in information content 



used for transmission of the captions and assigning a 
caption type to the data stream based on the or each 
characteristic. 
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Description 

[0001 ] The present invention relates to the field of dig- 
ital communications and provides a system and method 
for analysing digital signals, particularly digital television 
signals. 

[0002] Subtitles have been supplied with broadcast 
programmes using the teletext system for many years. 
Monitoring of the content of the subtitles by parsing the 
text of the subtitles may be advantageous since it may 
allow the service provider to monitor the content of the 
programme being broadcast, or ensure that the correct 
subtitles are being broadcast for a particular pro- 
gramme. 

[0003] Subtitles and captions associated with digital 
terrestrial television (DTT) signals are transmitted as bit- 
mapped images, rather than as text as was the case in 
the prior teletext based system. Since the subtitles are 
broadcast as bit mapped images, simple monitoring of 
the subtitle content by parsing the subtitles is not pos- 
sible. 

[0004] The present invention aims to provide a meth- 
od of monitoring subtitles and captions which does not 
require parsing of the text of the subtitles. 
[0005] Aspects of the invention are outlined in the in- 
dependent claims below and preferred features are out- 
lined in the dependent claims. 

[0006] A first aspect provides a method of processing 
a data stream in which captions are transmitted, the 
method comprising determining at least one quantitative 
characteristic of variation in information content of the 
captions and assigning a caption type to the data stream 
based on the or each characteristic. 
[0007] The method advantageously allows a caption 
type to be assigned to the data stream without the cap- 
tions themselves having to be decoded and analysed. 
The caption type can be assigned based on a quantita- 
tive characteristic of the variation in information content 
of the data stream. This can be achieved relatively sim- 
ply (analysis of the images themselves or even parsing 
of the text would be complex) but is found to be surpris- 
ingly effective. The captions may be, for example, sub- 
titles, sign-language captions or audio description cap- 
tions; or a combination of different types of captions may 
be transmitted in the data stream. The system may be 
arranged to detect the presence of captions in the data 
stream and a measure of the type of captions being 
transmitted. 

[0008] According to one preferred embodiment, the 
captions are transmitted as a series of images. This may 
be the case, for example, in systems where the captions 
are subtitles transmitted as bit mapped images, such as 
in DTT signals. The methods described herein are par- 
ticularly advantageous when used with bit mapped im- 
ages, since analysing the images themselves would be 
time consuming and require significant data processing 
before an estimate of the programme type could be de- 
termined. 
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[0009] A sample of quasi-instantaneous information 
rate may be detected by determining a measure of in- 
formation in a first, sampling, interval. An average infor- 
mation rate may be determined over a second, longer, 

5 averaging period from a plurality of samples. 

[0010] Preferably the at least one quantitative char- 
acteristic includes a measure of the bit rate associated 
with transmission of the captions. Hence the bit rate of 
a component of the data stream associated with the cap- 

10 tions may be measured and used to detect variation in 
the information content. 

[0011] Where a variable bit rate for the captions, is 
used, e.g. in Digital Terrestrial Television, this may be 
directly measured. In a teletext-type system where fixed 

15 size frames on pages are used, the information context 
can be monitored by detecting the amount of actual in- 
formation (e.g. text as opposed to space) in each frame/ 
page and, where applicable, the rate of update/change 
of the information. 

20 [0012] The caption type is preferably selected from 
one of a plurality of pre-defined caption types. Hence, 
the data stream may be classified into one out of a pre- 
determined set of types. 

[0013] Preferably, assigning a caption type includes 

25 identifying an error or abnormal caption condition. This 
may allow a service operator to be alerted to an errone- 
ous or abnormal caption condition, which may indicate, 
for example, a fault in the transmission of the data 
stream. The identification of abnormal conditions may 

30 highlight to the service operator any portions of the data 
stream for which further or deeper analysis may be ben- 
eficial. Hence the system may assist in fault detection 
or monitoring of the signal output and may help a broad- 
caster or user to police Service Level Agreements 

35 (SLAs), as discussed further below. 

[0014] Preferably, the plurality of pre-defined caption 
types includes at least one of: an absence of captions, 
a predetermined caption, for example an error or apol- 
ogy caption or a cleardown instruction, a series of pre- 

40 prepared block captions of at least one type, a series of 
pre-prepared short captions, a sequence of live captions 
and an uncertain or unidentified caption type. It is found 
that these different caption types may relatively reliably 
be determined from characteristics of the data stream, 

45 as described in more detail below, and provide useful 
measures of the content of the captions and data 
stream. 

[0015] In some instances, more than one candidate 
pre-defined caption type may be assigned to the data 

50 stream. This may be advantageous when the true cap- 
tion type is unclear and further analysis of the data 
stream is necessary to determine the caption type. 
[0016] According to a further, highly preferable fea- 
ture, an initial assignment of more than one candidate 

55 caption type is refined based on the history of the data 
stream. A caption type may be determined based on the 
history of caption types for that data stream. For exam- 
ple, the system may be arranged so that a caption type 
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that has frequently occurred in the recent history of the 
data stream is favoured over a new caption type. This 
may reduce the occurrence of sudden and incorrect 
changes in the detected caption type due to transients 
in the data. 

[0017] According to a preferable embodiment, the 
captions comprise subtitles. Preferably the subtitles are 
bit mapped images. 

[0018] According to a further embodiment, the cap- 
tions may comprise information for an access service 
other than subtitles, for example, the captions may com- 
prise audio description information or closed signing. 
[0019] According to a further preferable embodiment, 
the data stream comprises a television signal. 
[0020] Preferably, more than one type of caption may 
be analysed, if present. 

[0021] In one embodiment, a caption type may be as- 
signed signifying detection of presence or absence of 
the captions in place of detection of a quantitative meas- 
ure. 

[0022] A further aspect may provide a method of 
processing a data stream in which a non-textual access 
service may be transmitted, the method comprising de- 
termining variation in information content used for trans- 
mission of the access service and classifying the data 
stream based on the variation. 

[0023] Preferably, a programme type may be as- 
signed to the data stream based on the caption type as- 
signed to the data stream. Hence, the method may pro- 
vide a convenient method of determining the pro- 
gramme types in a television broadcast signal. 
[0024] Further preferably, the programme type classi- 
fication comprises at least one of: no subtitles, pre-re- 
corded, as live, live broadcast. Other programme types 
may also be incorporated into the system. In some cas- 
es, the types listed here may be further refined by addi- 
tional statistical analysis of the data stream and option- 
ally other information, for example from scheduling in- 
formation or time of broadcast, eg. a pre-recorded pro- 
gramme may be further classified as. for example, (likely 
to be) a drama or a children's program. 
[0025] The method preferably further comprises de- 
tecting at least one interstitial in the data stream. The 
interstitial may be due to, for example, a change in the 
programme transmitted or an advertisement break. An 
interstitial may be signified by detection of a sudden 
change in caption type (or the absence of captions) and 
may be supplemented by programme schedule informa- 
tion or a real time input, optionally based on the premise 
that intervals occur regularly within a programme and 
typically predominantly certain absolute times, e.g. 
whole hour and half past the hour as absolute times. 
Detection of interstitials may allow a user device or a 
broadcaster to determine the beginning or end of a pro- 
gramme which may, for example, trigger a recording de- 
vice to commence, pause, or cease recording of a pro- 
gramme. Viewing or storage may be controlled, for ex- 
ample a user may express a preference to mute sound 
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or change channel during advertisements or to skip 
them when recording a programme. 
[0026] The method preferably further comprises de- 
tecting at least one interstitial between determined pro- 

5 gramme types in the data stream. The method of de- 
tecting the interstitial may further comprise determining 
the nature of the interstitial, which may be due to, for 
example, a change in the programme transmitted or an 
advertisement break. Interstitials may be detected by 

10 the broadcaster or by user detection at a viewer end 
may, for example, allow a recording device to com- 
mence or cease recording of a particular programme. 
[0027] Preferably, the method further comprises de- 
tecting whether captions are transmitted in the data 

15 stream. The data stream may be monitored or tested at 
intervals to determine whether captions are being trans- 
mitted at that time and the caption type may then only 
be determined if the data stream has accompanying 
captions. This may reduce the amount of processing 

20 necessary if captions are not transmitted with the data 
stream at all times. Apparatus or a process may be ar- 
ranged to sample the data stream at intervals in a 
snooze mode and to wake up into an active mode when 
captions, for example subtitles or another access serv- 
es ice is detected. In the case of an access service which 
is transmitted only with certain programmes, a user may 
be alerted to the presence of a programme containing 
the access service, as discussed further below. 
[0028] As noted above, it has been appreciated that 

30 useful information can be obtained by quantitative vari- 
ation in information content of subtitles (transmitted pri- 
marily for the benefit of the hearing impaired). It has fur- 
ther been appreciated that detection of other "access" 
services, for example audio description (transmitted pri- 

35 marily for the visually impaired), closed signing (trans- 
mitted typically for the hearing impaired, typically as an 
alternative in place of subtitles), etc may be useful. Thus 
the first aspect may be applied more generally to access 
services and reference to caption in the context of the 

40 first aspect is preferably not limited to textual captions 
such as subtitles but preferably includes "captions" 
which may comprise other forms of access service data 
such as audio description information, signing and other 
services, preferably "closed" services (i.e. services 

45 which may be selectively "viewed"). 

[0029] It has been appreciated that some access 
services are unlikely to be transmitted with all pro- 
grammes, and hence binary detection of the presence 
or absence of an access service component may itself 

50 be directly useful (as well as or instead of quantitative 
analysis). For example, audio description is unlikely to 
be transmitted with a data stream that contains a news 
programme. In all cases, detection may be used to de- 
tect whether the access service is present. For a viewer, 

55 this can be used to trigger preferential recording of pro- 
grammes having the service and for e.g. a broadcaster, 
can be used to monitor correct delivery of the access 
service. Thus in a closely related further aspect, the in- 
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vention provides a method of processing a (televisual 
programme) data stream in which a non-textual access 
service may be transmitted, the method comprising de- 
termining variation in information content used for trans- 
mission of the access service and classifying the data 5 
stream based on the variation. A quantitative measure 
of information content may be useful in the case of many 
access services, although a fixed bit rate audio descrip- 
tion may require analysis of the audio to determine in- 
formation content. However, as noted above, simply de- 
termining binary variation (presence or absence of the 
access service) may be useful. 

[0030] A further aspect provides a method of classi- 
fying a programme, the method comprising receiving a 
signal encoding the programme and a series of cap- 
tions, determining at least one quantitative characteris- 
tic of variation in the information content of the series of 
captions and classifying the programme into one of a 
series of predefined programme types based on the or 
each characteristic. This method may provide a conven- 
ient way of classifying programmes without requiring the 
decoding and analysis of the captions (e.g. subtitles) 
themselves. 

[0031] Preferably, the captions comprise subtitles and 
further preferably, the subtitles are bit mapped images. 
[0032] According to a highly preferable embodiment, 
the characteristic includes at least one statistical meas- 
ure of the bit rate, and is preferably a combination of at 
least two statistical measures. Using a combination of 
two or more statistical measures of the bit rate is found 
to increase significantly the accuracy of the determined 
programme type. If the statistical measure is above or 
below a predetermined threshold, however, only one 
statistical measure of the information content may be 
used to classify the programme. This may allow more 
efficient determination of the programme type when the 
first statistical measure gives a clear result above or be- 
low a predetermined threshold. 

[0033] Preferably, the statistical measures include at 
least one of: the average bit rate, the number of contig- 
uous seconds of bit rate above or below a threshold, the 
number of contiguous seconds of zero bit rate, the his- 
togram peak bit rate for a period, the histogram peak, 
over a predetermined period, of the number of contigu- 
ous seconds of zero bit rate and the accumulated bit 
rate. These statistical measures are described in more 
detail below, but each measure can be quickly deter- 
mined from a data stream and provides a useful indica- 
tor of the programme type. 

[0034] Preferably, the method further comprises de- 
riving a measure of confidence for the determined pro- 
gramme classification. The determined programme 
classification may be made more useful if the user, typ- 
ically in this case a service operator, is also provided 
with a confidence measure for the classification. The 
measure of confidence may be based on how closely 
the statistical measure for the data fits the expected pat- 
tern for the determined programme type or for any pro- 



gramme type. 

[0035] Further preferably, the measure of confidence 
is updated as more data is received in the signal. If the 
determined programme type remains constant for a 
large portion of data, then the confidence level may in- 
crease. Hence, the history of the determined pro- 
gramme type for the data may be important in determin- 
ing the confidence level. 

[0036] According to a further preferable embodiment, 
the determined classification has hysteresis character- 
istics and is preferably partly determined based on the 
history of the quantitative characteristic. Hence the pro- 
gramme type determined for the previous section of da- 
ta preferably influences the programme type deter- 
mined for the next section of data. This may encourage 
consistency in the programme type determined for the 
data. 

[0037] According to a related feature, sudden and/or 
highly transient changes in the determined programme 
classification may be suppressed or processed as ex- 
ceptions. This is advantageous, since a brief transient 
(e.g. above 10 (or 30) seconds or less) change in the 
programme type is unlikely. Longer transients (e.g. 
more than about 1 0 (or 30) seconds) up to a few minutes 
e.g. (up to 5 (or in some cases 10) minutes) may be de- 
tected as an interstitial (e.g. commercial break) as dis- 
cussed below. The programme type may change from 
one programme to the next, but these longer term 
changes would preferably not be suppressed. A change 
back from the interstitial characterising to the previous 
classification may be detected as resumption of a pro- 
gramme and a change to a new characteristic may be 
detected as commencement of a new programme. 
[0038] Preferably, an initial set of parameters are 
stored for classifying the programme. Hence the initial 
programme type may be predefined or determined be- 
fore the data stream analysis starts. 
[0039] Preferably, the initial parameters are modified 
based on the received data. This may allow the deter- 
mined programme type to change as data is received. 
[0040] According to a further preferable embodiment, 
the method further comprises monitoring the content of 
the programme viewed based on the programme type 
assigned to the data stream. 

[0041 ] All the above methods may advantageously be 
used to monitor programme output. The above methods 
may further comprise monitoring programme output 
based on analysis of captions and/or classification 
based on a non-textual access service. This method 
may be followed by reporting the results of monitoring 
a plurality of channels and/or programmes to a user in- 
terface. The method may further comprise triggering an 
alarm based on the results of monitoring. A live alarm 
may be triggered during broadcast of a monitored pro- 
gramme. Advantageously a report on broadcast of a plu- 
rality of programmes following broadcast may be com- 
piled. These methods may be used by a broadcaster to 
monitor output. 
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[0042] The analysis may be supplemented by infor- 
mation available from another source of programme in- 
formation, for example schedule information . In the case 
of a broadcaster monitoring output, the analysis meth- 
od/apparatus may receive schedule information includ- 
ing information indicating the presence or absence of 
captions of a particular format (e.g. subtitles or other ac- 
cess service) and monitoring may include determining 
whether scheduled captions are present. The schedule 
information may include an indication of programme 
type and/or programme name. Analysis may include fit- 
ting caption type to programme type, or to an individual 
prgramme, and may include refining parameters asso- 
ciated with caption analysis based on input concerning 
programme type or particular programmes. 
[0043] The results of the refinement may be stored or 
communicated for use in subsequent analysis. The 
analysis method may include performing analysis based 
on a plurality of measures, wherein aset of classification 
parameters are set to initial values and wherein the clas- 
sification parameters are updated following analysis. 
Classification parameters may be updated by a user (e. 
g. a broadcaster) who has full or detailed knowledge of 
the programme being analysed and the classification 
parameters may be communicated to an analysis en- 
gine, for example at a viewer site, (dynamically, by pe- 
riodic update or on initial configuration) to assist in anal- 
ysis. 

[0044] By providing additional information to the anal- 
ysis method/apparatus, an expert system can be devel- 
oped which QeamsDto recognise programme types or 
individual programmes and can be used to detect cor- 
rect transmission of a schedule or to detect faults. Even 
if the system is not able to provide complete analysis, it 
may nonetheless provide a useful tool to a broadcaster. 
For example, an analysis engine may analyse output 
and the output may be recorded. The analysis engine 
may provide a log of determined programme type and 
may log uncertainties or errors in the captions. By cross- 
referencing the log with the recording, the amount of the 
recording to be checked to verify acceptable delivery 
may be reduced. Particularly if the analysis engine is 
provided with schedule information, a log may be pro- 
vided of points where analysis is consistent with the 
schedule information and where there are discrepan- 
cies or uncertainties or caption errors. This may be dy- 
namically linked to the recording so that an operator may 
jump through the recording based on the analysis to ver- 
ify such uncertainties manually. 

[0045] In a further aspect, the invention provides ap- 
paratus for analysing programme output comprising 
means for analysing captions or an access service in 
conjunction with other programme information identify- 
ing expected programme output and means for report- 
ing a potential discrepancy between the results of anal- 
ysis of the captions or access service and the expected 
programme output. The invention further provides cor- 
responding method and computer program aspects. 



[0046] As noted above, the content of the programme 
being viewed may be monitored by the service operator. 
This analysis may be performed at a broadcast or net- 
work point or remotely via a monitoring device receiving 
5 the broadcast service. Monitoring may be impersonal., 
used to check delivery of a service, or personal, which 
may allow the service operator to create a profile of one 
or more individual service users which may be used, for 
example, to target advertising material sent to the user. 
10 [0047] In addition, or alternatively, to monitoring by a 
service operator, the programme type may be monitored 
locally at the user site. According to one embodiment, 
this may allow a user device, for example a video stor- 
age or recording device, to build up a profile of the serv- 
es ice user based on the programmes watched. The profile 
information may then be used, for example to inform a 
user when a particular type of programme is showing or 
to record certain types of programmes automatically. 
[0048] Preferably, the method further comprises stor- 
20 jng a programme or controlling storage of programmes 
based on the programme type determined. This may al- 
low a video recording or storage device to store pro- 
grammes corresponding to a particular programme 
type. 

25 [0049] According to a preferable feature, the method 
further comprises receiving further input concerning the 
expected programme types, such as schedule informa- 
tion. Hence the determined programme type may be 
compared to the expected programme type. A broad- 

30 caster or user may be alerted to any anomalies detect- 
ed. This may be used by a broadcaster to detect faults 
in the programme output or signal multiplex or faults in 
the transmission of captions such as subtitles, signing 
or audio description captions. 

35 [0050] Preferably, the method further comprises stor- 
ing and updating a list of user preferences based on the 
determined programme type. The user preference list 
may be used to build up a profile of the user, which may 
be used by the service provider to provide additional 

40 content, such as targeted information, to the user. 

[0051] A further aspect provides apparatus for 
processing a data stream in which captions are trans- 
mitted, the apparatus comprising: 

45 means for receiving the data stream; 

means for determining at least one quantitative 
characteristic of variation in information content 
used for transmission of the captions; 
means for assigning a caption type to the data 

50 stream based on the or each characteristic. 

[0052] According to one embodiment, the captions 
are transmitted as a series of images. 
[0053] Preferably, the at least one quantitative char- 
55 acteristic includes a measure of bit rate associated with 
the captions. 

[0054] As discussed above, this may provide means 
for analysing captions within a data stream without ren- 
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dering the captions and decoding them for analysis. 
[0055] Preferably, the apparatus further comprises 
means for storing the determined bit rate value for a pre- 
determined initialisation period. Hence the history of the 
bit rate of the data stream may be taken into account in 
assigning the caption type to the data stream. 
[0056] Preferably, the quantitative characteristic com- 
prises at least one of: 

the average bit rate; 

the number of contiguous seconds of bit rate above 
or below a threshold; 

the number of contiguous seconds of zero bit rate; 

the histogram peak bit rate for a period; 

the histogram peak, over a predetermined period, 

preferably the initialisation period, of the number of 

contiguous seconds of zero bit rate; 

the accumulated bit rate. 

[0057] These statistical measures, or preferably a 
combination of the measures, may easily be calculated 
from the determined bit rate as described herein and 
may provide useful indicators of the caption type. 
[0058] Preferably, the data stream is a digital televi- 
sion signal and the captions are subtitles. Hence a dig- 
ital television service provider may monitor the content 
of the subtitles sent out with programmes, or of the pro- 
grammes themselves. According to an alternative em- 
bodiment, the captions may be audio description cap- 
tions or sign-language captions, or another non-textual 
access service. 

[0059] In one embodiment, a binary determination 
may be used in place of said quantitative measure. 
[0060] Preferably, the apparatus further comprises 
means for determining a programme type based on the 
assigned caption type for the data stream. 
[0061] According to a further preferable feature, the 
caption type determined is at least one of: 

an absence of captions; 

a predetermined caption, for example an error or 
apology caption or a cleardown instruction; 
a series of pre-prepared block captions of at least 
one type; 

a series of pre-prepared short captions; 

a sequence of live captions; 

an uncertain or unidentified caption type. 

[0062] According to a further preferable feature, the 
apparatus further comprises video storage means. This 
may allow programmes, or parts of programmes to be 
stored by the user or by the service provider. 
[0063] Preferably, the apparatus further comprises 
means for storing a list of user preferences based on 
the determined programme types for the programmes 
viewed. 

[0064] This feature may be provided in conjunction 
with the video storage feature or may be provided inde- 



pendently. 

[0065] Preferably the apparatus further comprises 
means for updating the user preference list according 
to the programme type viewed. 
5 [0066] Features of the method aspects may be ap- 
plied to the apparatus aspects and vice versa and fea- 
tures of method and apparatus aspects may be applied 
to other method and apparatus aspects. 
[0067] A further aspect provides a computer program 
or computer program product for carrying out a method 
according to any of the method aspects or any of their 
optional features. 

[0068] An embodiment of the invention will now be de- 
scribed in more detail with reference to the figures in 
which: 

Figure 1a to Figure 1g show examples of possible 
bit rate distributions for a data stream over a number 
of periods according to one embodiment; 
Figure 2 illustrates the relationship between several 
subtitle genres and quantities determined by statis- 
tical analysis according to one embodiment; 
Figures 3a and 3b provide an example of data an- 
alysed according to one embodiment of the systems 
and methods described herein; 
Figures 4a and 4b provide a further example of data 
analysed according to one embodiment of the sys- 
tems and methods described herein; 
Figures 5a and 5b provide a further example of data 
analysed according to one embodiment of the sys- 
tems and methods described herein. 

[0069] The invention is described below with refer- 
ence to a data stream in which subtitles for Digital Ter- 
restrial Television (DTT) are transmitted. However, the 
description is not intended to be limiting and the inven- 
tion may be applied to other types of captions such as 
audio description captions or sign-language captions 
transmitted in conjunction with other types of signal. 
[0070] As discussed above, subtitles for Digital Ter- 
restrial Television (DTT) are transmitted as bit-mapped 
images, rather than the traditional characters of a tele- 
text system. Full monitoring of subtitle content therefore 
generally requires the decoding of Digital Video Broad- 
cast (DVB) subtitles and other digital system subtitles 
and then rendering of the subtitles in some legible form. 
For a multichannel multiplex, monitoring content can 
therefore require a considerable amount of technology 
and/or human interaction. The systems and methods 
described herein provide a practical means for the de- 
termination of the broad nature of the subtitles (for ex- 
ample, are there subtitles present and are they consist- 
ent with the currently broadcasted programme?) without 
directly checking the subtitle textual content. This allows 
first-line monitoring of the subtitles in a simple, fast and 
cost-effective manner. 

[0071] Subtitles are transmitted on a just-in-time ba- 
sis and so the bit rate usage varies according to appli- 
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cation and in particular, according to the type of pro- 
gramme being broadcast. For example, a drama can 
have subtitles that are up to three lines long being trans- 
mitted once every few seconds, whereas a news bulletin 
will usually have live subtitling in which individual words 
are being transmitted almost at the rate of speech and 
so the bit rate stream is continual. Furthermore, during 
times when nothing is being said (or when no subtitles 
have been scheduled), a "cleardown" message is often 
transmitted to remove the last subtitle. Also, apology 
captions may be transmitted when a fault has been de- 
tected upstream of the subtitle inserter. However, no bit 
rate being transmitted is not necessarily a fault condi- 
tion. 

[0072] It is noted that the captions may be transmitted 
at a constant bit rate and other statistical measures of 
the information content (for example identification of "re- 
al" information as opposed to padding) may be meas- 
ured and used to determine the programme type. 
[0073] All these subtitling conditions have different 
cadences through their differing bit rates and timing be- 
tween subtitles. The cadence of a subtitle may be char- 
acterised by the number of bits transmitted in a prede- 
termined sampling period and the length of the gaps be- 
tween subtitle transmissions. 

[0074] The predetermined sampling period is prefer- 
ably of the order of 1 second, but this may be made 
shorter or longer depending on the system requirements 
or on the type of programme detected or expected. A 
shorter sampling period may increase the accuracy of 
the programme type determined, but may require a 
higher data processing speed for the system. A longer 
data sampling period may be set, for example if the pro- 
gramme is not expected to contain subtitles. The sam- 
pling period may also vary whilst the system is in oper- 
ation, for example, the sampling period may increase if 
there is a long period with a low or periodic bit rate, or 
the sampling period may decrease, for example if the 
bit rate becomes high or aperiodic. 
[0075] As described in more detail below, the samples 
obtained in each sampling period are themselves pref- 
erably averaged over a second, longer predetermined 
averaging period, e.g. over 10 seconds, to determine a 
moving average for the data stream. This may be used 
alone or in conjunction with other measures, as de- 
scribed below, to determine a programme type or a cap- 
tion type for the data stream. 

[0076] A number of examples of cadences are shown 
in Figures 1 a to 1 g and the cadences and possible in- 
terpretations of the cadences are discussed in more de- 
tail below. 

[0077] Figure 1 a shows a cadence for a zero bit rate. 
No bits of data are detected in any sampling period. De- 
tection of a zero bit rate could indicate that there are no 
subtitles or captions associated with the programme or 
that there is a fault with the subtitle or caption service. 
If a zero bit rate is detected over an extended period, or 
is detected when it is not expected, then the system 



could be configured to send a warning signal to the serv- 
ice operator. 

[0078] Figure 1 b shows a cadence for a periodic sig- 
nal with low peak rates. Such a signal may be generated 

5 by a subtitle "cleardown" signal, which may be sent to 
ensure that no subtitles or captions are displayed when 
none are scheduled. Alternatively, the cadence may be 
generated at an interval in the programme, when there 
are no subtitles or captions but there may be on-screen 

10 graphics generating a periodic low peak rate signal. 
[0079] The cadence of Figure 1 c has a periodic data 
rate with moderate peak rates. This may be generated, 
for example, by an apology message, which may be 
transmitted when a fault condition has been detected 

15 upstream of the subtitle or caption inserter. 

[0080] Figure 1d shows an aperiodic cadence with a 
low or moderate average bit rate, but with high peak bit 
rates. This type of cadence may be typical of block sub- 
titles or captions, which maybe broadcast in conjunction 

20 with a prerecorded programme which contains dia- 
logue, for example, a drama. 

[0081] Figure 1 e also shows an aperiodic cadence, 
but with a low average bit rate and moderate peak bit 
rates. This may also be interpreted as block subtitles or 
25 captions, but the subtitles are more likely to be being 
broadcast in conjunction with, for example a children's 
program, which is not as dialogue-intensive as a drama 
programme. 

[0082] The cadence of Figure 1 f is also aperiodic and 
30 has a low or moderate average bit rate and a moderate 
or high peak bit rate. This may correspond to a live 
broadcast, such as a live news programme. 
[0083] Figure 1 g shows a semi-periodic cadence with 
a low or moderate average bit rate and high peak bit 
35 rates. This may correspond to as-live subtitles or cap- 
tions, (e.g. sign-language captions) such as those 
broadcast with prepared parts of a news report. 
[0084] As suggested above, analysis of the bit rate us- 
age for DVB subtitles, measured from the transmitted 
^0 stream, may be performed to determine a cadence. The 
cadence may then be used to infer something of the na- 
ture of the subtitles or captions. 

[0085] Data may be gathered by any technique that 
is able to analyse a DVB transport stream and that will 

45 produce statistics about the number of packets for each 
service component that have been transmitted within 
each sampling period. The sampling period can be user- 
defined and may typically be one second. The data may 
then be analysed by use of an algorithm which may in- 

50 elude further averaging over a longer averaging period, 
typically or the order of 10 seconds. 
[0086] According to one embodiment, the algorithm 
handles one subtitle component within the transport 
stream, hence each subtitle stream uses its own private 

55 invocation. 

[0087] One example of the statistical analysis of the 
subtitles will now be described in more detail. According 
to this embodiment, the technique involves measuring 
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the bit rate used for DVB subtitles for any service over 
each averaging period to generate at least one statisti- 
cal indicator. Different combinations of these indicators 
can be used to identify subtitle genres. 
[0088] Four statistical indicators are described in 
more detail below. Further indicators may also be used 
to identify subtitle genres and a combination of two or 
more of the indicators may also be used to identify the 
genre or to verify the identification. Each of the indica- 
tors described includes an historical element in its final 
value. On running the algorithm, the user may define an 
initialisation value. The depth of history for each indica- 
tor may then be established from this value. The indica- 
tors may then be recalculated to allow subtitle genres to 
be established for every new set of subtitle statistics. 

Statistical indicators used on the data may include: 

[0089] Average Bit Rate: The average bit rate may be 
calculated as a moving average taken over a period giv- 
en by the initialisation value. For example, if the initiali- 
sation value is 10 seconds, then the average bit rate is 
the average of the bit rates calculated for each of the 
last 10 predefined sampling periods if the predefined 
sampling period is 1 second. 

[0090] Contiguous Seconds of Zero Bit Rate: The 
contiguous seconds of zero bit rate may be defined as 
a count of the number of predefined sampling periods 
since any bit rate was received. It may be defined as 
zero for any period during which one or more packets 
were received, otherwise, it may count upwards for each 
iteration of the algorithm until the next packet is re- 
ceived. 

[0091] Histogram Peak for Previous Contiguous Sec- 
onds of Zero Bit Rate: The histogram peak for previous 
contiguous seconds of zero bit rate may be defined as 
the peak in a Probability Distribution Function (PDF) of 
recent contiguous seconds of zero bit rate. A PDF may 
be created by counting the number of recent contiguous 
seconds of zero bit rate. The peak in the histogram cor- 
responds to the most common value forcontiguous sec- 
onds of zero bit rate, in other words, the one with the 
highest probability. 

[0092] The number of values used to make the PDF 
may be defined by the initialisation value. Every time a 
break in the subtitle stream ends, the value of contigu- 
ous seconds of zero bit rate may be stored (in place of 
the oldest value in the list). A value of zero is preferably 
also stored for every ten seconds of contiguous seconds 
of non-zero bit rate. Similarly, a value of 20 may be 
stored for every twenty seconds of contiguous seconds 
of zero bit rate. This may be done to ensure that long 
periods of activity (or inactivity) in subtitle bit rate do not 
skew the indicator. This means that fresh contiguous 
seconds of zero bit rate are not then compared with very 
old ones and that the maximum number of histogram 
bins for the PDF is restricted. 

[0093] Accumulated Bit Rate: The accumulated bit 



rate may be defined as the total sum of subtitle packets 
received since the last period of contiguous zero bit rate. 
New packets may be added to the accumulated bit rate 
total until there is an averaging period with zero bit rate. 
5 At this point, the accumulated bit rate may be reset to 
zero. 

[0094] According to the present embodiment, the sub- 
title genre is tested whenever a new set of subtitle sta- 
tistics has been released. To begin with, it is preferably 

10 acknowledged that no subtitle genre has been identified 
and so the genre may be defined as "uncertain". Com- 
parisons may then be made between the incoming sta- 
tistics and the conditions for each genre. From this, the 
subtitle genre can be identified. Where no suitable com- 

*5 parison can be made, the genre may remain "uncertain". 
A record of previous subtitle genres may be kept and 
the current genre may be added to the record (prefera- 
bly in the place of the oldest genre). The most common 
genre within the record may then be selected as the gen- 

20 re for the data stream. 

[0095] Figure 2 shows typical limits for the statistics 
which may be chosen for a number of subtitle genres 
according to one embodiment of the present invention. 
These limits are provided by way of example only and 

25 are not limiting. Limits for further subtitle genres may 
also be defined in a similar manner. It should be noted 
that these values are for an initialisation value of 1 0 with 
an averaging period of 1 second. They should be 
changed for any other initialisation value or averaging 

30 period and may also need to be re-evaluated depending 
on the DVB (and other digital systems) multiplex. Also 
note that one packet equals 1504 bits. The limits shown 
were calculated for a BBC DTT multiplex carrying five 
services; one at a constant bit rate and a further four 

35 services in a statistical multiplex bundle. 

[0096] Tighter limits may be chosen if greater certain- 
ty is required in the definition of the subtitle genre, al- 
though this is likely to lead to more subtitles being 
classed as "uncertain". The "Overload" genre may be 

40 selected when the average bit rate exceeds a prede- 
fined limit, in this example, 95 kilobits (kbits) per second 
(one second being the predefined averaging period in 
this example). 

[0097] According to the present embodiment, the al- 
45 gorithm takes half the initialisation value in number of 
iterations to react to a change in subtitling. A typical 
moving average period of the order of 10 seconds may 
be used. Longer moving average periods may flatten the 
distinctions between genres and may also be slower in 
50 reacting to sudden changes in genre. As subtitles can 
change within a programme or at a programme bound- 
ary, it may be useful to set the initialisation value to re- 
flect this. For a monitoring situation, sudden changes 
can reflect a critical condition that needs urgent atten- 
55 tion. Shorter periods would be less accurate because 
they consider fewer values. When compared to an op- 
timum moving average period, the longer or shorter the 
averaging period becomes, the less accurate the subti- 
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tie genre indication becomes. Selecting a suitable ini- 
tialisation value involves determining the ideal compro- 
mise between accuracy and efficiency of the algorithm. 
[0098] The most recently established subtitle genres 
from previous iterations of the algorithm may be kept 
and the length of this record may be taken from the ini- 
tialisation value. The most common genre within the 
record may then be selected as the result of that iteration 
of the algorithm. A confidence measure is preferably al- 
so determined for the selected genre. The confidence 
measure in the result may be presented as the percent- 
age of votes that the subtitle genre received to be con- 
sidered most common. 

[0099] Examples of typical system output are shown 
in Figures 3 to 5. The average bit rate, the accumulated 
bit rate, the number of contiguous seconds of zero bit 
rate ; the genre determined and the confidence factor is 
shown for each averaging period (1 s in this case). In 
the first example, shown in Figure 3, the genre deter- 
mined for the programme is "Live". Due to the history of 
the bit stream being taken into account, the determined 
genre does not change even when the bit rate tempo- 
rarily falls to zero. 

[0100] In the example of Figure 4, the subtitle genre 
is determined as "Prepared or As-live" with a number of 
"Cleardown" signals detected during the sample. The 
confidence level for the "Prepared or As-live" determi- 
nation falls from 1 00% to 60% as the "Cleardown" sig- 
nals begin to be detected. 

[0101] The determined genre of the sample of Figure 
5 changes from an "Apology Caption" to "Prepared or 
As-live" and then becomes "Uncertain" before changing 
to a "Cleardown" signal. 

[0102] As shown in the examples given above, subti- 
tling cadences can be extracted from raw subtitle statis- 
tics allowing determination of the subtitle genre without 
the need to decode the transport stream. The process 
may be implemented independently of metadata and 
other influences. Genetic algorithms may be used to al- 
low the algorithms to be applied to other multiplexes. 
[0103] The system may be used as part of a monitor- 
ing system for operational areas and for output monitor- 
ing, making it a useful monitoring system of digital serv- 
ices, even those that have already been prepared for 
transmission. 

[01 04] The system may also be used on receipt of the 
digital transmission, for example to monitor service pro- 
vision or to advise a user of the type of programme that 
is being received on a particular channel. According to 
a further embodiment, the system may be incorporated 
into user equipment, for example a recording device, 
such as a video or DVD recorder. When the recording 
device is programmed, the programme type may be de- 
termined, either from the user or from a preprogrammed 
schedule, which may be accessed remotely by the re- 
cording device. The recording device may then monitor 
the subtitle data, as described above and may only be- 
gin recording when the subtitle genre determined 



matches with the programme type expected. This sys- 
tem may also be used, for example, to avoid recording 
breaks in the programme, such as commercial breaks, 
which may be detected by monitoring of the subtitle gen- 
5 re. 

[0105] The description of the embodiment above is 
not intended to be limiting in any way and changes may 
be made to the embodiment described without depart- 
ing from its scope as defined by the following claims. 
10 The features disclosed in the description and in the 
claims may be incorporated in the invention independ- 
ently or in combination with other features disclosed. 

15 Claims 

1 . A method of processing a data stream in which cap- 
tions are transmitted, the method comprising deter- 
mining at least one quantitative characteristic of 

20 variation in information content used for transmis- 
sion of the captions and assigning a caption type to 
the data stream based on the or each characteristic. 

2. A method according to Claim 1 wherein the captions 
25 are transmitted as a series of images. 

3. A method according to Claim 1 or 2 wherein said at 
least one quantitative characteristic includes a 
measure of bit rate associated with said caption. 

30 

4. A method according to any preceding claim wherein 
the caption type is selected from one of a plurality 
of pre-defined caption types, preferably wherein as- 
signing a caption type includes identifying an error 

35 or abnormal caption condition, and further prefera- 
bly wherein the plurality of pre-defined caption 
types includes: 

an absence of captions; 
40 a predetermined caption, for example an error 

or apology caption or a cleardown instruction; 
a series of pre-prepared block captions of at 
least one type; 

a series of pre-prepared short captions; 
45 a sequence of live captions; 

an uncertain or unidentified caption type. 

5. A method according to Claim 4 wherein more than 
one candidate pre-defined caption type is assigned 

50 to the data stream, and preferably wherein an initial 
assignment of more than one candidate caption 
type is refined based on the history of the data 
stream. 

55 6. A method according to any preceding claim wherein 
the captions comprise subtitles. 

7. A method according to any of Claims 1 to 5 wherein 
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the captions comprise information for an access 
service other than subtitles, preferably wherein the 
captions comprise audio description information or 
closed signing. 

8. A method according to any preceding claim wherein 
more than one type of caption is analysed, if 
present. 

9. A method according to Claim 7 wherein a caption 
type is assigned signifying detection of presence or 
absence of the captions in place of detection of a 
quantitative measure. 

10. A method of processing a data stream in which a 
non-textual access service may be transmitted, the 
method comprising determining variation in infor- 
mation content used for transmission of the access 
service and classifying the data stream based on 
the variation. 

1 1 . A method according to any preceding claim wherein 
the data stream comprises a television signal, pref- 
erably wherein a programme type is assigned to the 
data stream based on the caption type assigned to 
the data stream or the classification of the data 
stream, and further preferably wherein the pro- 
gramme type classification comprises at least one 
of: 

no subtitles; 
pre-recorded; 
as live; 

live broadcast. 

12. A method according to Claim 11 wherein the pro- 
gramme type is classified as pre-recorded and 
wherein the pre-recorded programme is further 
subclassified. 

1 3. A method according to Claim 11 or 1 2 further com- 
prising detecting at least one interstitial between 
caption types in the data stream, preferably wherein 
the interstitial comprises a change between pro- 
grammes transmitted or an advertisement break 
within a programme. 

1 4. A method according to any preceding claim wherein 
a sample of quasi instantaneous information rate is 
detected by determining a measure of information 
in the first sampling interval and an average infor- 
mation rate is determined over a second, longer, av- 
eraging period from a plurality of samples. 

15. A method according to any preceding claim further 
comprising monitoring programme output based on 
analysis of captions or classification based on a 
non-textual access service, and preferably further 



18 

comprising reporting the results of monitoring a plu- 
rality of channels and/or programmes to a user in- 
terface and preferably further comprising triggering 
an alarm based on the results of monitoring, and 
5 suitably wherein a live alarm is triggered during 

broadcast of a monitored programme. 

16. A method according to Claim 15, further comprising 
compiling a report on broadcast of a plurality of pro- 

10 grammes following broadcast. 

17. A method of classifying a programme, the method 
comprising receiving a signal encoding the pro- 
gramme and a series of subtitles, determining at 

is least one quantitative characteristic of variation in 
the information content of the series of subtitles and 
classifying the programme into one of a series of 
predefined programme types based on the or each 
characteristic. 

20 

18. A method according to Claim 17 wherein the subti- 
tles are bit mapped images. 

19. A method according to Claim 1 7 or 1 8 wherein said 
25 at least one quantitative characteristic includes a 

measure of bit rate associated with said caption. 

20. A method according to any of Claims 17 to 19 
wherein the quantitative characteristic includes at 

30 least one statistical measure of the information con- 
tent, and is preferably a combination of at least two 
statistical measures. 

21. A method according to any of Claims 17 to 20 
35 wherein the quantitative characteristic is above or 

below a predetermined threshold and wherein only 
one statistical measure of the information content 
is used to classify the programme. 

22. A method according to Claim 20 or21 as dependent 
on Claim 19 wherein the statistical measures in- 
clude at least one of: 

the average bit rate; 
45 the number of contiguous seconds of bit rate 

below or above a threshold; 
the number of contiguous seconds of zero bit 
rate; 

the histogram peak bit rate for a period; 
50 the histogram peak, over a predetermined pe- 

riod, of the number of contiguous seconds of 
zero bit rate; 
the accumulated bit rate. 

55 23. A method according to any of Claims 1 7 to 22 fur- 
ther comprising deriving a measure of confidence 
for the determined programme classification, pref- 
erably wherein the measure of confidence is updat- 
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ed as more data is received in the signal. 

24. A method according to any of Claims 17 to 23 33. 
wherein the determined classification has hystere- 
sis characteristics and is preferably partly deter- 5 
mined based on the history of the quantitative char- 
acteristic, preferably wherein sudden and/or tempo- 
rary changes in the determined programme classi- 
fication are suppressed. 

10 

25. A method according to any of Claims 17 to 24 
wherein an initial set of parameters are stored for 
classifying the programme, preferably wherein the 
initial parameters are modified based on the re- 
ceived data. 15 

26. A method according to any of Claims 17 to 25 fur- 
ther comprising monitoring the content of the pro- 
gramme viewed based on the programme type as- 
signed to the data stream. 20 



tions are transmitted as a series of images. 

Apparatus according to Claim 31 or 32 wherein said 
at least one quantitative characteristic includes a 
measure of bit rate associated with said captions, 
preferably further comprising means for storing the 
determined bit rate value for a predetermined initial- 
isation period, and preferably wherein the quantita- 
tive characteristic is at least one of: 

the average bit rate; 

the number of contiguous seconds of bit rate 

above or below a threshold; 

the number of contiguous seconds of zero bit 

rate; 

the histogram peak bit rate for a period; 
the histogram peak, over a predetermined pe- 
riod, preferably the initialisation period, of the 
number of contiguous seconds of zero bit rate; 
the accumulated bit rate. 



27. A method according to any of Claims 17 to 26 fur- 
thercomprising detecting at least one interstitial be- 
tween determined programme types in the data 
stream, preferably wherein detecting the interstitial 25 
further comprises determining the nature of the in- 
terstitial, and preferably further comprising control- 
ling storage or viewing of a programme based on 
detection of an interstitial. 



28. A method according to any of Claims 17 to 27 fur- 
ther comprising receiving further inputfor use in de- 
termining the programme type, and preferably fur- 
ther comprising comparing the determined pro- 
gramme type to the expected programme type. 

29. A method according to any of Claims 17 to 28 fur- 
ther comprising storing a programme based on the 
programme type determined. 

30. A method according to any of Claims 17 to 29 fur- 
ther comprising controlling storage of programmes 
based on the determined programme type, and 
preferably further comprising storing and updating 
a list of user preferences based on the determined 
programme type. 

31. Apparatus for processing a data stream in which 
captions are transmitted, the apparatus comprising: 



30 



35 



40 



45 



50 



means for receiving the data stream; 

means for determining at least one quantitative 

characteristic of variation in information content 

used for transmission of the captions; 

means for assigning a caption type to the data 55 

stream based on the or each characteristic. 

32. Apparatus according to Claim 31 wherein the cap- 



34. Apparatus according to any of Claims 31 to 33 
wherein the data stream is a digital television signal 
and wherein the captions comprise subtitles. 

35. Apparatus according to any of Claims 31 to 33 
wherein the captions comprise a non-textual ac- 
cess service, preferably wherein a binary determi- 
nation is used in place of said quantitative measure. 

36. Apparatus according to Claim 34 or 35 further com- 
prising means for determining a programme type 
based on the assigned caption type for the data 
stream. 

37. Apparatus according to any of Claims 31 to 35 
wherein the caption type determined is at least one 
of: 

an absence of captions; 

a predetermined caption, for example an error 
or apology caption or a cleardown instruction; 
a series of pre-prepared block captions of at 
least one type; 

a series of pre-prepared short captions; 

a sequence live of captions; 

an uncertain or unidentified caption type. 

38. Apparatus according to any of Claims 31 to 37 fur- 
ther comprising video storage means. 

39. Apparatus according to any of Claims 36 or 38 fur- 
ther comprising means for storing a list of user pref- 
erences based on the determined programme types 
for the programmes viewed, and preferably further 
comprising means for updating the user preference 
list according to the programme type viewed. 
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40. Apparatus, a computer program or computer pro- 
gram means for carrying out a method according to 
any of Claims 1 to 30. 

41. Apparatus for analysing programme output com- 5 
prising means for analysing captions or an access 
service in conjunction with other programme infor- 
mation identifying expected programme output and 
means for reporting a potential discrepancy be- 
tween the results of analysis of the captions or ac- '0 
cess service and the expected programme output. 
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